80 research outputs found

    Qualitative analysis of post-editing for high quality machine translation

    Get PDF
    In the context of massive adoption of Machine Translation (MT) by human localization services in Post-Editing (PE) workflows, we analyze the activity of post-editing high quality translations through a novel PE analysis methodology. We define and introduce a new unit for evaluating post-editing effort based on Post-Editing Action (PEA) - for which we provide human evaluation guidelines and propose a process to automatically evaluate these PEAs. We applied this methodology on data sets from two technologically different MT systems. In that context, we could show that more than 35% of the remaining effort can be saved by introducing of global PEA and edit propagation

    Bilexical embeddings for quality estimation

    Get PDF
    © 2017 The Authors. Published by Association for Computational Linguistics. This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: http://dx.doi.org/10.18653/v1/W17-4760This work was supported by the QT21 project (H2020 No. 645452)

    Multimodal quality estimation for machine translation

    Get PDF
    © 2020 The Authors. Published by Association for Computational Linguistics. This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: http://dx.doi.org/10.18653/v1/2020.acl-main.114We propose approaches to Quality Estimation (QE) for Machine Translation that explore both text and visual modalities for Multimodal QE. We compare various multimodality integration and fusion strategies. For both sentence-level and document-level predictions, we show that state-of-the-art neural and feature-based QE frameworks obtain better results when using the additional modality.This work was supported by funding from both the Bergamot project (EU H2020 Grant No. 825303) and the MultiMT project (EU H2020 ERC Starting Grant No. 678017)

    Sheffield systems for the English-Romanian translation task

    Get PDF
    © 2016 The Authors. Published by Association for Computational Linguistics. This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: http://dx.doi.org/10.18653/v1/W16-2307This work was supported by the QT21 (H2020 No.645452) project

    Quality in, quality out: learning from actual mistakes

    Get PDF
    © 2020 The Authors. Published by European Association for Machine Translation. This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: https://www.aclweb.org/anthology/2020.eamt-1.16/Approaches to Quality Estimation (QE) of machine translation have shown promising results at predicting quality scores for translated sentences. However, QE models are often trained on noisy approximations of quality annotations derived from the proportion of post-edited words in translated sentences instead of direct human annotations of translation errors. The latter is a more reliable ground-truth but more expensive to obtain. In this paper, we present the first attempt to model the task of predicting the proportion of actual translation errors in a sentence while minimising the need for direct human annotation. For that purpose, we use transfer-learning to leverage large scale noisy annotations and small sets of high-fidelity human annotated translation errors to train QE models. Experiments on four language pairs and translations obtained by statistical and neural models show consistent gains over strong baselines.This work was supported by the Bergamot project (EU H2020 Grant No. 825303)

    USFD’s phrase-level quality estimation systems

    Get PDF
    © 2016 The Authors. Published by Association for Computational Linguistics. This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: http://dx.doi.org/10.18653/v1/W16-2386Logacheva, V., Blain, F. and Specia, L. (2016) USFD’s phrase-level quality estimation systems. In, Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, Bojar, O., Buck, C., Chatterjee, R., Federmann, C. et al. (eds.) Stroudsburg, PA: Association for Computational Linguistics, pp. 800-805.This work was supported by the EXPERT (EU FP7 Marie Curie ITN No. 317471, Varvara Logacheva) and the QT21 (H2020 No. 645452, Lucia Specia, Fred´ eric Blain) projects

    Phrase level segmentation and labelling of machine translation errors

    Get PDF
    © 2016 The Authors. Published by European Language Resources Association (ELRA). This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: https://www.aclweb.org/anthology/L16-1356/This paper presents our work towards a novel approach for Quality Estimation (QE) of machine translation based on sequences of adjacent words, the so-called phrases. This new level of QE aims to provide a natural balance between QE at word and sentence-level, which are either too fine grained or too coarse levels for some applications. However, phrase-level QE implies an intrinsic challenge: how to segment a machine translation into sequence of words (contiguous or not) that represent an error. We discuss three possible segmentation strategies to automatically extract erroneous phrases. We evaluate these strategies against annotations at phrase-level produced by humans, using a new dataset collected for this purpose.The authors would like to thanks all the annotators who helped to create the first version of gold-standard annotations at phrase-level. This work was supported by the QT21 (H2020 No. 645452, Lucia Specia, Fred´ eric Blain) and EX-PERT (EU FP7 Marie Curie ITN No. 317471, Varvara Logacheva) projects

    Sheffield submissions for the WMT18 quality estimation shared task

    Get PDF
    © 2018 The Authors. Published by Association for Computational Linguistics. This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: http://dx.doi.org/10.18653/v1/W18-6463In this paper we present the University of Sheffield submissions for the WMT18 Quality Estimation shared task. We discuss our submissions to all four sub-tasks, where ours is the only team to participate in all language pairs and variations (37 combinations). Our systems show competitive results and outperform the baseline in nearly all cases.Carolina Scarton is supported by the EC project SIMPATICO (H2020-EURO-6-2015, grant number 692819). Frederic Blain is supported by the Amazon Academic Research Awards program

    Tailoring Domain Adaptation for Machine Translation Quality Estimation

    Full text link
    While quality estimation (QE) can play an important role in the translation process, its effectiveness relies on the availability and quality of training data. For QE in particular, high-quality labeled data is often lacking due to the high-cost and effort associated with labeling such data. Aside from the data scarcity challenge, QE models should also be generalizable, i.e., they should be able to handle data from different domains, both generic and specific. To alleviate these two main issues -- data scarcity and domain mismatch -- this paper combines domain adaptation and data augmentation within a robust QE system. Our method is to first train a generic QE model and then fine-tune it on a specific domain while retaining generic knowledge. Our results show a significant improvement for all the language pairs investigated, better cross-lingual inference, and a superior performance in zero-shot learning scenarios as compared to state-of-the-art baselines.Comment: Accepted to EAMT 2023 (main

    Tailoring Domain Adaptation for Machine Translation Quality Estimation

    Get PDF
    While quality estimation (QE) can play an important role in the translation process, its effectiveness relies on the availability and quality of training data. For QE in particular, high-quality labeled data is often lacking due to the high cost and effort associated with labeling such data. Aside from the data scarcity challenge, QE models should also be generalizable, i.e., they should be able to handle data from different domains, both generic and specific. To alleviate these two main issues -- data scarcity and domain mismatch -- this paper combines domain adaptation and data augmentation within a robust QE system. Our method first trains a generic QE model and then fine-tunes it on a specific domain while retaining generic knowledge. Our results show a significant improvement for all the language pairs investigated, better cross-lingual inference, and a superior performance in zero-shot learning scenarios as compared to state-of-the-art baselines
    • …
    corecore